chore(release): 0.24.0 — payload-shape telemetry#135
Conversation
…-tokenizer Adds payload-shape instrumentation to MCP telemetry. New doubles 3-7 capture wire size and cl100k_base token counts for every request and response, plus the wall-clock cost of tokenization itself. Implementation: - New module workers/src/tokenize.ts wraps gpt-tokenizer/encoding/cl100k_base with a lazy-loaded singleton encoder and a safe-failure surface (countTokensSafe, measurePayloadShape). Module-level promise caches the encoder across requests within a worker isolate; cold path pays parse once, all subsequent calls are warm. - Refactors workers/src/telemetry.ts recordTelemetry signature to accept a pre-read body string + optional PayloadShape rather than reading the request body itself. Schema doc comment expanded to describe doubles 3-7. Synchronous now (no longer returns a Promise) since the callers measurement work happens in waitUntil. - Updates workers/src/index.ts call site: clones the response (when Content-Type is application/json), reads request and response bodies in the waitUntil background task, calls measurePayloadShape, then recordTelemetry. Zero user-facing latency added — measurement happens after the response is sent. SSE responses skip body measurement. Tokenizer choice: - gpt-tokenizer/encoding/cl100k_base over @anthropic-ai/tokenizer. Empirical bench (Node v22, same V8 as Workers): cl100k median 0.05-1.3ms across 200B-50KB payloads vs 0.30-7.4ms for Anthropic WASM. p95 dramatically better (no WASM memory-grow spikes). - Token count diverges ~3-4% from Claude tokenizer on English prose; acceptable noise floor for shape analysis (we are not billing). - Bundle delta measured empirically via esbuild: 432KB gzipped (993KB minified). Comfortably within paid-tier Workers limits. Failure handling: - Any tokenizer load or encode failure → countTokensSafe returns null, treated as 0 in telemetry. tokenize_ms = 0 alongside non-zero bytes signals a measurement skip in the data. - Telemetry must never break MCP requests — all measurement code wrapped in try/catch within the waitUntil block. Tests: - New workers/test/tokenize.test.mjs (8 cases, all pass): empty input, positive integer output, scaling with length, full PayloadShape contract, UTF-8 byte length correctness, JSON-RPC payload tokenization, tokenize_ms finiteness, empty-response (SSE) skip path. - Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports; exercises the same TypeScript surface that ships in the worker bundle. - npm run typecheck clean. Methodology note: - This change exists because three theoretical objections (bundle bloat, vodka violation, tokenizer-choice domain opinion) were falsified by a five-minute bench. See klappy://canon/constraints/measure-before-you-object and klappy://canon/observations/performed-prudence-anti-pattern (drafts pending merge into klappy.dev).
Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads. Verifies end-to-end that the full PayloadShape lands in doubles 3-7, that bytes match TextEncoder UTF-8 length, that batch JSON-RPC produces one point per message, and that malformed input is silently dropped. 7/7 cases pass. Notable: the realistic ~8KB response measured tokenize_ms=0.948ms — within 14% of the bench prediction (~1.1ms median for 8KB on Node). The dream-home walkthrough was accurate; real prod will differ but the order of magnitude is locked. Compiles tokenize.ts + telemetry.ts via tsc into a temp dir, post-patches the JSON import to add Node 22's required attribute syntax, then dynamic-imports. Same code path that ships in the worker bundle. This is the verification that wrangler dev would have done if workerd ran in this nested sandbox (it doesn't — workerd dies after declaring ready, likely a Linux capability issue with the container).
Two assertions that would have failed against the pre-fix code: 1. SSE response now asserts tokenize_ms=0 (was: only checked bytes_out/tokens_out, missed the spurious non-zero tokenize_ms that the original logic would record on every SSE response). 2. New test 'Bugbot invariant: tokenize_ms is 0 only when encoder did not actually run' explicitly covers the both-empty case (must be 0) and the request-only case (must be valid finite number). Both new assertions verify Bugbot's distinction: a 0 from countTokensSafe on empty input is a trivial short-circuit, not a real tokenization. Only non-null results on non-empty input prove the encoder ran. The pre-fix code conflated these and would have polluted the bench-vs-prod A/B comparison with spurious tokenize_ms readings on SSE traffic. Real-world tokenize_ms on the realistic 8KB integration test: 1.016ms (bench predicted 1.1ms — within 8%). 8/8 cases passing.
… JSON
CRITICAL FIX. A managed-agent smoke test against the preview deployment
caught that doubles 4 (bytes_out), 6 (tokens_out), and 7 (tokenize_ms)
were all zero across every recorded data point. Six telemetry rows
queried, six rows with bytes_out=0.
Root cause: the call site in workers/src/index.ts filtered the response
clone by Content-Type, only cloning when the type included
'application/json'. MCP's Streamable HTTP transport returns
'text/event-stream' (SSE) for tool calls, not JSON. The filter was
silently dropping almost every response, leaving responseClone null and
recording zeros for the entire response side.
This was the same performed-prudence pattern the new canon docs warn
about, applied in micro: I assumed MCP responses would be JSON without
measuring what the SDK actually returns. The smoke test caught it
because canon also prescribes verification before declaring done.
Fix:
1. New helper measureResponseShape(requestText, response) in tokenize.ts.
Clones the response, reads the body, runs measurePayloadShape. No
Content-Type filter — read everything. SSE protocol overhead (~10
bytes per event) is negligible against the actual payload size, and
oddkit's responses are bounded (no long-lived streams).
2. Call site in index.ts simplified to use the helper. Drops the
filter, drops the separate clone, drops the responseClone variable.
Cleaner code AND correct behavior.
3. Four new unit tests for measureResponseShape:
- measures application/json responses
- measures text/event-stream responses (this would have caught the
bug pre-merge)
- leaves the original response body intact (clone correctness)
- handles already-consumed body without throwing
12/12 unit tests pass, typecheck clean.
Methodology note: this fix exists because the smoke test (live MCP
calls + telemetry_public SQL) caught what unit tests missed. The
canon-prescribed verification gate worked exactly as designed —
release-validation-gate (E0008.3) at klappy://canon/constraints/release-validation-gate
mandates independent live smoke for load-bearing surface changes
before merge. The agent dispatch is that smoke.
…Workers Third smoke confirmed bytes_in/out and tokens_in/out now populate correctly (357-21319 bytes_out, 142-5398 tokens_out across varied payloads). But double7 (tokenize_ms) is still 0 across every row. Root cause: Cloudflare Workers' performance.now() is a deterministic timer — it does NOT advance during synchronous CPU work. The mitigation prevents timing-side-channel attacks. The timer only ticks on I/O. Tokenization (countTokensSafe) is pure CPU work. The encoder runs between two reads of performance.now() with no I/O in between, so both reads return the same value and tokenize_ms is always 0. Tests passed in Node because Node's performance.now() is a real high-resolution timer. Fix: switch to Date.now(). Always advances, at 1ms resolution. The bench-vs-prod comparison loses sub-millisecond precision (sub-ms tokenizations round to 0) but gains a working signal for any payload above ~5KB where bench timing exceeded 1ms. Updated the telemetry.ts schema doc comment to document the 1ms resolution and the Workers-specific reason. Methodology: this is the third Cloudflare Workers gotcha caught in prod that unit tests can't catch — Workers Runtime != Node: 1. b94aaa6 (mine): assumed MCP responses are application/json (they're SSE) 2. 1a555df (mine): assumed clone() inside waitUntil works (body already drained) 3. THIS: assumed performance.now() advances in synchronous code (it doesn't) Each was caught by the live Managed Agent smoke + telemetry_public SQL, not by typecheck or unit tests. The release-validation-gate is the only thing standing between this branch and a quietly broken prod telemetry pipeline. 8 unit tests still pass. Typecheck clean.
Fourth smoke confirmed bytes_in/out and tokens_in/out work in production (357-21319 bytes_out, 142-5398 tokens_out across varied payload sizes). But tokenize_ms remained 0 across every row even with the Date.now() fix from 279f761. Root cause discovered by the agent: Cloudflare Workers freezes BOTH performance.now() AND Date.now() during synchronous CPU work. Both timers only advance on network I/O events as a side-channel mitigation (documented at developers.cloudflare.com/workers/runtime-apis/web-standards/). Tokenization is pure CPU work, so any sub-request timing of it always reads 0 in production. This is a structural runtime constraint, not a bug we can patch. Workarounds considered and rejected: - Force artificial I/O between reads (KV.list, fetch) — adds real latency to telemetry-only paths, grotesque - Two writeDataPoint calls with start/end timestamps — over-engineered, doubles write count, complicates queries - Keep the column as always-0 — actively misleading Decision: drop tokenize_ms entirely from PayloadShape, the doubles array, schema doc, and tests. The bench at workers/test/tokenize.test.mjs already characterized the cost curve (cl100k handles 50 KB in ~1.3 ms on Node v22). Bytes_out + tokens_out are sufficient signal — a future maintainer can predict tokenize_ms from the bench curve given the observed payload sizes. Schema before: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out, tokenize_ms] // 7 fields Schema after: doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in, tokens_out] // 6 fields Companion canon update at klappy/klappy.dev coming in next commit on that branch — drops tokenize_ms row from the doubles table and removes the tokenize_ms mention in 'What This Enables'. Methodology: this is the fourth Workers Runtime != Node behavioral diff caught by live smoke on this branch. Each was unmeasurable from unit tests because Node behaves differently: 1. b94aaa6 (mine, broken): Content-Type filter (MCP returns SSE) 2. 1a555df (mine, broken): clone in waitUntil (body already drained) 3. 279f761 (mine, broken): Date.now() in Workers (frozen too) 4. THIS: drop the unmeasurable column entirely The release-validation-gate canon doc is the only thing that surfaced each of these — the live preview smoke + telemetry_public SQL caught what no test setup I could ship would have caught. The Workers-runtime gap was real and the gate worked. Tests: - 7/7 unit tests pass (workers/test/tokenize.test.mjs) - 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs) - typecheck clean
Minor bump for payload-shape telemetry (PR #134). Bumps: package.json 0.23.1 -> 0.24.0 workers/package.json 0.23.1 -> 0.24.0 package-lock.json 0.23.0 -> 0.24.0 (root drifted one release behind) workers/package-lock.json 0.23.1 -> 0.24.0 CHANGELOG.md gains the [0.24.0] entry above [0.23.1] documenting: - Added: bytes_in/out, tokens_in/out telemetry doubles + helpers - Changed: drop the Content-Type filter (MCP responses are SSE) - Removed: tokenize_ms — Workers freezes both perf.now and Date.now - Fixed: root package-lock.json version drift back-fill The four Workers Runtime != Node behavioral diffs caught by the five Managed Agent smoke sessions on this branch are listed in the Refs trailer for forensic record. Tests: 7/7 unit + 6/6 integration pass on bumped state. Typecheck clean (reports as oddkit-mcp-worker@0.24.0). Per workflow: dedicated chore/release-x.y.z PR. Branch is off feat/telemetry-tokenization HEAD, so it carries the feature commits + the bump together. After merge, feat/telemetry-tokenization can be closed (its commits are already in main via this release branch).
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
oddkit | d023ad6 | Commit Preview URL Branch Preview URL |
Apr 23 2026, 09:30 PM |
|
Closing — bump consolidated onto #134 (commit |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.
Reviewed by Cursor Bugbot for commit d023ad6. Configure here.
| const cacheTier = tracer.indexSource; | ||
| // Clone the response synchronously before returning so the body is | ||
| // still available to read inside the deferred waitUntil callback. | ||
| const responseClone = response.clone(); |
There was a problem hiding this comment.
Unprotected response.clone() can break MCP responses
Medium Severity
The response.clone() call sits outside any try/catch, while the ctx.waitUntil callback's catch block (line 991–993) explicitly upholds the invariant "Telemetry must never break MCP requests." If clone() throws (e.g., the SDK returns a response with an already-disturbed or locked body), the exception prevents return response from ever executing, turning a telemetry-only code path into a user-facing 500 error. The old code had no response.clone() at all, so this is a new risk. Moving the clone inside the existing try/catch (or wrapping it in its own) would preserve the stated safety guarantee.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d023ad6. Configure here.


Release: 0.24.0 (minor)
Carries the payload-shape telemetry feature from
feat/telemetry-tokenizationplus the version bump. Branch is based onfeat/telemetry-tokenizationHEAD so all 9 commits ride along — when this lands, the feature branch can be closed.Bumps
package.jsonworkers/package.jsonpackage-lock.jsonworkers/package-lock.json⚠ Root
package-lock.jsonhad drifted one release behind (0.23.0while workers was at0.23.1) — back-filled here. Both lockfiles still require manual sync per current tooling; the pre-commit hook only enforces sync between the twopackage.jsonfiles.What's in this release
The full CHANGELOG entry is on the diff. Headline items:
bytes_in,bytes_out,tokens_in,tokens_outtelemetry doubles viagpt-tokenizer/encoding/cl100k_base. Module-level lazy singleton, ~432 KB gzipped, ~6× faster than@anthropic-ai/tokenizerper the in-tree bench. All measurement happens inctx.waitUntilso user-facing latency is unchanged.text/event-stream, notapplication/json, and the original filter caused 100% of tool_call responses to recordbytes_out=0.tokenize_ms(formerlydouble7). Cloudflare Workers freezes bothperformance.now()andDate.now()between network I/O events as a timing-side-channel mitigation, making sub-request timing of pure-CPU tokenization structurally unmeasurable. The bench atworkers/test/tokenize.test.mjscharacterized the cost curve; future per-call cost is predictable from observedbytes_out/tokens_outagainst that curve.package-lock.jsondrift back-fill.Full schema after this release:
Validation
workers/test/tokenize.test.mjs)workers/test/telemetry-integration.test.mjs)oddkit-mcp-worker@0.24.0)sesn_011CaMNujMg9pymcz18JFPp8) confirmed all four shape fields populate with realistic varied values across distinct tools (oddkit_catalog: 21,437 bytes_out / 5,856 tokens_out;oddkit_time: 178 bytes_out / 71 tokens_out).MAX(double7) = 0confirmstokenize_mscleanly absent.Workers-runtime forensics
Four distinct Workers ≠ Node behavioral diffs surfaced and resolved on this branch, each caught by live smoke (none by unit tests). Listed in the CHANGELOG
Refstrailer with the corresponding agent session IDs.Companion PR (canon)
klappy/klappy.dev#134 — telemetry-governance schema update + two new constraints (
measure-before-you-object,performed-prudence-anti-pattern). Suggested merge order: that one first (governance lands,telemetry_policyreflects new schema immediately), then this one.Sequencing options
feat/telemetry-tokenizationbecomes redundant and can be closed.feat/telemetry-tokenization(feat(telemetry): add bytes_in/out, tokens_in/out, tokenize_ms via gpt-tokenizer #134) first, then rebase this branch → release PR becomes a clean 1-commit diff (just package.json + lockfiles + CHANGELOG).Either works.
Note
Medium Risk
Adds new telemetry measurement and schema fields (bytes/tokens) to the production Workers MCP handler and introduces a new tokenizer dependency, which could affect runtime performance/memory and Analytics Engine dashboards despite being deferred via
waitUntil. Risk is mitigated by defensive try/catch, synchronous response cloning, and new unit/integration tests covering the write path.Overview
Adds payload-shape telemetry to the Workers MCP server: each request now records
bytes_in,bytes_out,tokens_in, andtokens_outas new doubles (double3–double6) using a lazily loadedgpt-tokenizercl100k encoder (workers/src/tokenize.ts), executed inctx.waitUntilto avoid impacting response latency.Updates the telemetry write path to accept the already-read request body string, attach the new payload metrics to every written data point (including batch JSON-RPC), and drops the previously attempted
tokenize_msfield; response measurement no longer gates onContent-Typeso SSE/tool-call responses are included.Bumps versions to
0.24.0(root + workers) and syncs both lockfiles; adds unit + integration tests validating tokenizer behavior and end-to-end telemetry schema/writes.Reviewed by Cursor Bugbot for commit d023ad6. Bugbot is set up for automated code reviews on this repo. Configure here.